Descriptive Clustering as a Method for Exploring Text Collections

نویسنده

  • Dawid Weiss
چکیده

Descriptive k-Means natomiast ekstrakcj˛ e fraz cz˛ estych, frazy nominalne oraz grupowanie przy pomocy algorytmu k-Means (k-´ srednich). W pracy przedstawiono eksperymenty obliczeniowe dla obu algorytmów. Wyniki eksperymentów porównuj ˛ a jako´s´c grupowania (rozumian ˛ a jako sposób odtworzenia znanego przydziału dokumentów do grup) przy u˙ zyciu Lingo oraz Descriptive k-Means, z ich najbli˙ zszymi odpowiednikami literaturowymi — algorytmami Suffix Tree Clustering oraz k-Means. Inny istotny aspekt praktyczny ewaluacji stanowi przedstawienie danych zebranych z publicznej wersji systemu Carrot 2 , dost˛ epnego na zasadach wolnego oprogramowania.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

EventRiver: An Event-Based Visual Analytics Approach to Exploring Large Text Collections with a Temporal Focus

Many Text Collections with a Temporal Focus (TCTFs), such as news corpora and weblogs, are generated to report and discuss real life events. Thus Event-Related Tasks (ERTs), such as detecting the real life events driving the text, tracking their evolution, and investigating the reports and discussions around these events, are important when exploring such text collections. In this paper, we pro...

متن کامل

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

A Probabilistic Hierarchical Clustering Method for Organising Collections of Text Documents

In this paper a generic probabilistic framework for the unsupervised hierarchical clustering of large-scale sparse high-dimensional data collections is proposed. The framework is based on a hierarchical probabilistic mixture methodology. Two classes of models emerge from the analysis and these have been termed as symmetric and asymmetric models. For text data specifically both asymmetric and sy...

متن کامل

Probabilistic Hierarchical Clustering Method for Organizing Collections of Text Documents

In this paper a generic probabilistic framework for the unsupervised hierarchical clustering of large-scale sparse high-dimensional data collections is proposed. The framework is based on a hierarchical probabilistic mixture methodology. Two classes of models emerge from the analysis and these have been termed as symmetric and asymmetric models. For text data specifically both asymmetric and sy...

متن کامل

Constrained Text Clustering Using Word Trigrams

In recent years there has emerged the field of Constrained Clustering, which proposes clustering algorithms which are able to accommodate domain information to obtain a better final grouping. This information is usually provided as pairwise constraints, whose acquisition from humans can be costly. In this paper we propose a novel method based on word n-grams to automatically extract positive co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006